# Multimodal Visual Representation
Webssl Mae1b Full2b 224
A 1-billion-parameter Vision Transformer model trained via masked autoencoder self-supervised learning on 2 billion web images, capable of learning visual representations without language supervision.
Image Classification
Transformers

W
facebook
36
0
RADIO B
RADIO is a vision foundation model developed by NVIDIA Research, capable of unifying visual information across different domains for various vision tasks.
Image Segmentation
Transformers

R
nvidia
999
3
Featured Recommended AI Models